AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Neural Information Processing SystemsFeb-8-2025, 21:52:37 GMT

Accelerated Mini-batch Randomized Block Coordinate Descent Method

Tuo Zhao, Mo Yu, Yiming Wang, Raman Arora, Han Liu

artificial intelligence, gradient, machine learning, (18 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neural Information Processing SystemsJan-20-2025, 08:34:28 GMT

Reviews: The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM

I find the approach rather interesting, especially the broad and general definition of the problem makes the approach applicable to a wide range of problems. However, I was surprised by the absence of any reference to the seminal Robbins/Munro paper and also to the recent developments in stochastic gradient descent based sampling (see below). The authors do local gradient descent updates of coordinate blocks by computing partial gradients and adding noise in each asynchronous step. I was wondering, how this relates to the "usual" stochastic gradient descent update, i.e., given that the locally computed partial gradient will be based on delayed (noisy) variable states, a sequence of these noisy partial gradients would converge to the true partial gradient as well. Further, recent SGD based sampling has shown that adding noise to the variable states obtained by noisy gradient updates (as the authors do as well) provides good samples of the distribution underlying the optimal variable setting also in a non-convex setting. That being said, the work handed in remains valid, but it would have been interesting to compare the proposed approach to well established stochastic gradient methods.

apalm clapping, faster nonsmooth nonconvex optimization, stochastic asynchronous palm, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

arXiv.org Artificial IntelligenceApr-29-2024

Evolutionary Reinforcement Learning via Cooperative Coevolution

Hu, Chengpeng, Liu, Jialin, Yao, Xin

Recently, evolutionary reinforcement learning has obtained much attention in various domains. Maintaining a population of actors, evolutionary reinforcement learning utilises the collected experiences to improve the behaviour policy through efficient exploration. However, the poor scalability of genetic operators limits the efficiency of optimising high-dimensional neural networks. To address this issue, this paper proposes a novel cooperative coevolutionary reinforcement learning (CoERL) algorithm. Inspired by cooperative coevolution, CoERL periodically and adaptively decomposes the policy optimisation problem into multiple subproblems and evolves a population of neural networks for each of the subproblems. Instead of using genetic operators, CoERL directly searches for partial gradients to update the policy. Updating policy with partial gradients maintains consistency between the behaviour spaces of parents and offspring across generations. The experiences collected by the population are then used to improve the entire policy, which enhances the sampling efficiency. Experiments on six benchmark locomotion tasks demonstrate that CoERL outperforms seven state-of-the-art algorithms and baselines. Ablation study verifies the unique contribution of CoERL's core ingredients.

algorithm, coerl, subproblem, (16 more...)

2404.14763

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Neural Information Processing SystemsMar-13-2024, 07:01:38 GMT

Accelerated Mini-batch Randomized Block Coordinate Descent Method Yiming Wang

gradient, iteration, partial gradient, (16 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Krishnan, M. Nikhil, Ebrahimi, MohammadReza, Khisti, Ashish

Sequential Gradient Coding For Straggler Mitigation

arXiv.org Artificial IntelligenceJun-28-2023

In distributed computing, slower nodes (stragglers) usually become a bottleneck. Gradient Coding (GC), introduced by Tandon et al., is an efficient technique that uses principles of error-correcting codes to distribute gradient computation in the presence of stragglers. In this paper, we consider the distributed computation of a sequence of gradients $\{g(1),g(2),\ldots,g(J)\}$, where processing of each gradient $g(t)$ starts in round-$t$ and finishes by round-$(t+T)$. Here $T\geq 0$ denotes a delay parameter. For the GC scheme, coding is only across computing nodes and this results in a solution where $T=0$. On the other hand, having $T>0$ allows for designing schemes which exploit the temporal dimension as well. In this work, we propose two schemes that demonstrate improved performance compared to GC. Our first scheme combines GC with selective repetition of previously unfinished tasks and achieves improved straggler mitigation. In our second scheme, which constitutes our main contribution, we apply GC to a subset of the tasks and repetition for the remainder of the tasks. We then multiplex these two classes of tasks across workers and rounds in an adaptive manner, based on past straggler patterns. Using theoretical analysis, we demonstrate that our second scheme achieves significant reduction in the computational load. In our experiments, we study a practical setting of concurrently training multiple neural networks over an AWS Lambda cluster involving 256 worker nodes, where our framework naturally applies. We demonstrate that the latter scheme can yield a 16\% improvement in runtime over the baseline GC scheme, in the presence of naturally occurring, non-simulated stragglers.

artificial intelligence, deep learning, machine learning, (18 more...)

2211.13802

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Maßny, Luis, Hofmeister, Christoph, Egger, Maximilian, Bitar, Rawad, Wachter-Zeh, Antonia

Nested Gradient Codes for Straggler Mitigation in Distributed Machine Learning

arXiv.org Machine LearningDec-16-2022

We consider distributed learning in the presence of slow and unresponsive worker nodes, referred to as stragglers. In order to mitigate the effect of stragglers, gradient coding redundantly assigns partial computations to the worker such that the overall result can be recovered from only the non-straggling workers. Gradient codes are designed to tolerate a fixed number of stragglers. Since the number of stragglers in practice is random and unknown a priori, tolerating a fixed number of stragglers can yield a sub-optimal computation load and can result in higher latency. We propose a gradient coding scheme that can tolerate a flexible number of stragglers by carefully concatenating gradient codes for different straggler tolerance. By proper task scheduling and small additional signaling, our scheme adapts the computation load of the workers to the actual number of stragglers. We analyze the latency of our proposed scheme and show that it has a significantly lower latency than gradient codes.

artificial intelligence, machine learning, straggler, (16 more...)

arXiv.org Machine Learning

2212.0858

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Hanna, Serge Kas, Bitar, Rawad, Parag, Parimal, Dasari, Venkat, Rouayheb, Salim El

Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning

arXiv.org Artificial IntelligenceAug-4-2022

We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers, each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k

fastest-k sgd, gradient, iteration, (16 more...)

2208.03134

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Zeulin, Nikita, Galinina, Olga, Himayat, Nageen, Andreev, Sergey, Heath, Robert W. Jr

Dynamic Network-Assisted D2D-Aided Coded Distributed Learning

arXiv.org Artificial IntelligenceApr-3-2022

Today, various machine learning (ML) applications offer continuous data processing and real-time data analytics at the edge of a wireless network. Distributed real-time ML solutions are highly sensitive to the so-called straggler effect caused by resource heterogeneity and alleviated by various computation offloading mechanisms that seriously challenge the communication efficiency, especially in large-scale scenarios. To decrease the communication overhead, we rely on device-to-device (D2D) connectivity that improves spectrum utilization and allows efficient data exchange between devices in proximity. In particular, we design a novel D2D-aided coded federated learning method (D2D-CFL) for efficient load balancing across devices. The proposed solution captures system dynamics, including data (time-dependent learning model, varied intensity of data arrivals), device (diverse computational resources and volume of training data), and deployment (varied locations and D2D graph connectivity). To minimize the number of communication rounds, we derive an optimal compression rate for achieving minimum processing time and establish its connection with the convergence time. The resulting optimization problem provides suboptimal compression parameters, which improve the total training time. Our proposed method is beneficial for real-time collaborative applications, where the users continuously generate training data resulting in the model drift.

artificial intelligence, iteration, machine learning, (15 more...)

2111.14789

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Finland > Pirkanmaa > Tampere (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(2 more...)

Genre: Research Report (0.81)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceNov-8-2021

Losses, Dissonances, and Distortions

Castro, Pablo Samuel

In recent years, there has been a growing interest in using machine learning models for creative purposes. In most cases, this is with the use of large generative models which, as their name implies, can generate high-quality and realistic outputs in music [Huang et al., 2019], images [Esser et al., 2021], text [Brown et al., 2020], and others. The standard approach for artistic creation using these models is to take a pre-trained model (or set of models) and use them for producing output. The artist directs the model's generation by "navigating" the latent space [Castro, 2020], fine-tuning the trained parameters [Dinculescu et al., 2019], or using the model's output to steer another generative process [White, 2019, Castro, 2019]. At a high-level what all these approaches are doing is converting the numerical signal of a machine learning model's output into art, whether implicitly or explicitly. However, in most (if not all) cases they only do so after the initial model has been trained.

distortion, gradient, lissajous knot, (15 more...)

2111.05128

Genre: Research Report (0.41)

Industry:

Leisure & Entertainment (0.71)
Media > Music (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)